Implement VAES AVX and AVX512 backends for aes #482

silvanshade · 2025-05-29T08:12:49Z

Here is an updated implementation of VAES256 and VAES512 support for aes.

Benchmarks are from a Ryzen 9950X3D.

VAES512

RUSTFLAGS="-Ctarget-cpu=native" cargo +nightly bench

test aes128_decrypt_block  ... bench:         948.52 ns/iter (+/- 9.78) = 17282 MB/s
test aes128_decrypt_blocks ... bench:         249.35 ns/iter (+/- 0.38) = 65799 MB/s
test aes128_encrypt_block  ... bench:         951.15 ns/iter (+/- 7.81) = 17228 MB/s
test aes128_encrypt_blocks ... bench:         249.41 ns/iter (+/- 0.50) = 65799 MB/s
test aes128_new            ... bench:           8.99 ns/iter (+/- 0.01)
test aes192_decrypt_block  ... bench:       1,176.52 ns/iter (+/- 18.27) = 13931 MB/s
test aes192_decrypt_blocks ... bench:         304.76 ns/iter (+/- 20.07) = 53894 MB/s
test aes192_encrypt_block  ... bench:       1,206.06 ns/iter (+/- 39.16) = 13585 MB/s
test aes192_encrypt_blocks ... bench:         297.87 ns/iter (+/- 26.79) = 55164 MB/s
test aes192_new            ... bench:          11.45 ns/iter (+/- 0.60)
test aes256_decrypt_block  ... bench:       1,468.79 ns/iter (+/- 13.34) = 11160 MB/s
test aes256_decrypt_blocks ... bench:         344.53 ns/iter (+/- 3.02) = 47627 MB/s
test aes256_encrypt_block  ... bench:       1,472.97 ns/iter (+/- 24.24) = 11130 MB/s
test aes256_encrypt_blocks ... bench:         346.99 ns/iter (+/- 2.39) = 47352 MB/s
test aes256_new            ... bench:          12.31 ns/iter (+/- 0.02)

VAES256

RUSTFLAGS="-Ctarget-cpu=native --cfg aes_avx512_disable" cargo +nightly bench

test aes128_decrypt_block  ... bench:         916.29 ns/iter (+/- 12.56) = 17886 MB/s
test aes128_decrypt_blocks ... bench:         443.50 ns/iter (+/- 4.39) = 36984 MB/s
test aes128_encrypt_block  ... bench:         918.70 ns/iter (+/- 17.38) = 17847 MB/s
test aes128_encrypt_blocks ... bench:         451.14 ns/iter (+/- 1.01) = 36328 MB/s
test aes128_new            ... bench:           8.82 ns/iter (+/- 0.01)
test aes192_decrypt_block  ... bench:       1,141.80 ns/iter (+/- 8.07) = 14359 MB/s
test aes192_decrypt_blocks ... bench:         542.46 ns/iter (+/- 6.65) = 30228 MB/s
test aes192_encrypt_block  ... bench:       1,143.93 ns/iter (+/- 11.54) = 14334 MB/s
test aes192_encrypt_blocks ... bench:         540.67 ns/iter (+/- 2.86) = 30340 MB/s
test aes192_new            ... bench:          10.83 ns/iter (+/- 0.01)
test aes256_decrypt_block  ... bench:       1,434.76 ns/iter (+/- 20.45) = 11425 MB/s
test aes256_decrypt_blocks ... bench:         629.03 ns/iter (+/- 0.54) = 26047 MB/s
test aes256_encrypt_block  ... bench:       1,432.48 ns/iter (+/- 21.32) = 11441 MB/s
test aes256_encrypt_blocks ... bench:         629.17 ns/iter (+/- 3.79) = 26047 MB/s
test aes256_new            ... bench:          11.90 ns/iter (+/- 0.54)

AES-NI

RUSTFLAGS="-Ctarget-cpu=native --cfg aes_avx512_disable --cfg aes_avx256_disable" cargo +nightly bench

test aes128_decrypt_block  ... bench:         949.50 ns/iter (+/- 14.32) = 17264 MB/s
test aes128_decrypt_blocks ... bench:         937.90 ns/iter (+/- 8.32) = 17485 MB/s
test aes128_encrypt_block  ... bench:         952.43 ns/iter (+/- 12.84) = 17210 MB/s
test aes128_encrypt_blocks ... bench:         938.51 ns/iter (+/- 11.09) = 17466 MB/s
test aes128_new            ... bench:           8.96 ns/iter (+/- 0.02)
test aes192_decrypt_block  ... bench:       1,174.35 ns/iter (+/- 11.97) = 13955 MB/s
test aes192_decrypt_blocks ... bench:       1,139.37 ns/iter (+/- 3.99) = 14384 MB/s
test aes192_encrypt_block  ... bench:       1,176.52 ns/iter (+/- 9.51) = 13931 MB/s
test aes192_encrypt_blocks ... bench:       1,139.30 ns/iter (+/- 4.89) = 14384 MB/s
test aes192_new            ... bench:          11.08 ns/iter (+/- 0.01)
test aes256_decrypt_block  ... bench:       1,470.33 ns/iter (+/- 22.56) = 11145 MB/s
test aes256_decrypt_blocks ... bench:       1,327.11 ns/iter (+/- 15.53) = 12346 MB/s
test aes256_encrypt_block  ... bench:       1,432.09 ns/iter (+/- 10.22) = 11441 MB/s
test aes256_encrypt_blocks ... bench:       1,288.63 ns/iter (+/- 11.47) = 12720 MB/s
test aes256_new            ... bench:          12.25 ns/iter (+/- 0.31)

newpavlov · 2025-06-02T14:43:55Z

@tarcieri
Maybe we should make the VAES support a nightly-only feature enabled using configuration flags? With the current implementation round keys and blocks can not be kept in registers and have to be spilled to stack. I have concerns that supporting VAES by default like this would result in inefficient code when combined with other crates like ctr and ghash.

tarcieri · 2025-06-02T14:59:08Z

I think making VAES opt-in via cfg is fine for getting this PR merged, but note that AVX-512 support just passed FCP and avx512_target_feature will be stable on Rust 1.89, I believe, so nightly will soon be unnecessary

newpavlov · 2025-06-02T15:06:09Z

In that case, I think we should release aes v0.9.0 without VAES support and shortly after release aes v0.9.1 with intrinsics-based VAES support. This would allow users who don't use the latest stable to use v0.9.0 thanks to the MSRV-aware resolver.

tarcieri · 2025-06-02T15:09:16Z

I'm not sure how long it will take to stabilize all the relevant intrinsics. If they're through FCP, hopefully soon? If they take awhile we can polyfill them with asm!

newpavlov · 2025-06-02T15:11:53Z

The intrinsics (e.g. _mm512_aesenc_epi128) link to the stdarch_x86_avx512 issue, so I assumed that intrinsics will be stabilized as part of this FCP. The relevant target features were already stabilized: rust-lang/rust#138940

tarcieri · 2025-06-02T15:14:55Z

aes/Cargo.toml

@@ -31,7 +31,7 @@ hazmat = [] # Expose cryptographically hazardous APIs

 [lints.rust.unexpected_cfgs]
 level = "warn"
-check-cfg = ["cfg(aes_compact)", "cfg(aes_force_soft)"]
+check-cfg = ["cfg(aes_compact)", "cfg(aes_force_soft)", "cfg(avx256_disable)", "cfg(avx512_disable)"]


Perhaps namespace these configs under aes_* like the others?

Suggested change

check-cfg = ["cfg(aes_compact)", "cfg(aes_force_soft)", "cfg(avx256_disable)", "cfg(avx512_disable)"]

check-cfg = ["cfg(aes_compact)", "cfg(aes_force_soft)", "cfg(aes_avx256_disable)", "cfg(aes_avx512_disable)"]

Ideally, this would be handled with "negative" target features, but alas. :/

It also may be worth to name it just aes_disable_vaes. It does not look like we have different target features for VAES256 and VAES512 intrinsiscs, so it may be worth to just remove the vaes256 backend.

UPD: Ah, VAES512 intrinsics are also gated on avx512f. Though I wonder if there are CPUs which support VAES256, but not VAES512.

I looked into it and it looks like some CPU families support VAES256, but not VAES512 (e.g. Zen 3).

It also may be worth to name it just aes_disable_vaes. It does not look like we have different target features for VAES256 and VAES512 intrinsiscs, so it may be worth to just remove the vaes256 backend.

Intel started disabling AVX512 on Alder Lake CPUs if I remember correctly and had at one point stated in their roadmap that they intended not to include it on newer consumer oriented CPUs. That was another reason I included VAES256.

It sounds like maybe Intel's plans have changed though: https://www.phoronix.com/news/Intel-AVX10-Drops-256-Bit

But I also use the VAES256 backend as a fallback for encoding tail blocks:

block-ciphers/aes/src/x86.rs

Lines 534 to 569 in fb85281

#[inline]

fn encrypt_tail_blocks(&self, blocks: InOutBuf<'_, '_, Block>) {

let backend = self;

let mut rem = blocks.len();

let (mut iptr, mut optr) = blocks.into_raw();

let backend = &$name_backend::Vaes256::from(backend);

if backend.features.has_vaes256() {

while rem >= backend.par_blocks() {

let blocks = unsafe { InOut::from_raw(iptr.cast(), optr.cast()) };

backend.encrypt_par_blocks(blocks);

rem -= backend.par_blocks();

iptr = unsafe { iptr.add(backend.par_blocks()) };

optr = unsafe { optr.add(backend.par_blocks()) };

}

}

let backend = &$name_backend::Ni::from(backend);

while rem >= backend.par_blocks() {

let blocks = unsafe { InOut::from_raw(iptr.cast(), optr.cast()) };

backend.encrypt_par_blocks(blocks);

rem -= backend.par_blocks();

iptr = unsafe { iptr.add(backend.par_blocks()) };

optr = unsafe { optr.add(backend.par_blocks()) };

}

while rem > 0 {

let block = unsafe { InOut::from_raw(iptr, optr) };

backend.encrypt_block(block);

rem -= 1;

iptr = unsafe { iptr.add(1) };

optr = unsafe { optr.add(1) };

}

}

}

I renamed the config flags with the aes_ prefix as suggested.

silvanshade · 2025-06-10T00:19:48Z

In that case, I think we should release aes v0.9.0 without VAES support and shortly after release aes v0.9.1 with intrinsics-based VAES support. This would allow users who don't use the latest stable to use v0.9.0 thanks to the MSRV-aware resolver.

Just so we understand each other, I have no intention of refactoring this code further, especially not a full rewrite using intriniscs, without first seeing:

A full review of what is currently in this PR.
A clear outline of what conditions need to be met to for the PR to be merged.

tarcieri

I think we should just merge this and address the remaining problems in followup PRs

silvanshade · 2025-06-10T19:46:45Z

I think we should just merge this and address the remaining problems in followup PRs

I think that would be a good approach.

I'm willing to help maintain and continue to improve the code if it's going to be used.

tarcieri · 2025-06-10T20:43:52Z

@newpavlov does that sound good to you?

Edit: since it's nightly-only that effectively means it's unstable anyway. We can call it out as such until such a time that we can stabilize it, but I think retaining the cfg gating would still be good until we're actually sure about the performance implications of having it on-by-default.

newpavlov · 2025-06-11T12:05:53Z

Personally, I would prefer to have a proper intrinsics-based implementation released in v0.9.1, but if there is a desire to have an AVX-512 support in v0.9.0, I am fine with merging this. Although, I believe that in the latter case it should be an experimental Nighlty-only backend gated behind a cfg flag. Plus, I don't think it makes sense to have the OnceCell caching of broadcasted keys, I think it only would obstruct compiler optimizations. But since @silvanshade did not give us rights to edit this PR, we can change in later PRs.

silvanshade · 2025-06-11T16:30:41Z

First, thanks for finally merging the PR.

Plus, I don't think it makes sense to have the OnceCell caching of broadcasted keys, I think it only would obstruct compiler optimizations.

That design has been in place since the original PR, which is a long time ago now. You could have reviewed the design, performed your own benchmarks, and proposed a better alternative.

I would have welcomed feedback on a better design.

Unless the cipher API is redesigned to separate the backends, I don't see how you avoid a compromise.

If you always store the broadcast keys, you get a very measurable slowdown for the non-VAES case which I showed in the previous thread. If you rebroadcast for every chunk of blocks for VAES, you also get a slowdown from unnecessary work.

Caching the keys on demand seems to alleviate those performance hits in the benchmarks.

But since @silvanshade did not give us rights to edit this PR, we can change in later PRs.

You could have asked me to enable the option for maintainer commits if you had changes you wanted to make.

I would have expected we would have discussed those in a review first though.

tarcieri · 2025-06-11T16:32:57Z

Now that we have the core design merged we can experiment with those details without them blocking the PR

silvanshade mentioned this pull request May 29, 2025

VAES support #372

Open

tarcieri requested a review from newpavlov May 29, 2025 18:01

silvanshade force-pushed the vaes branch 4 times, most recently from ae7ec9c to 93e30c4 Compare May 31, 2025 01:06

tarcieri reviewed Jun 2, 2025

View reviewed changes

Implement VAES AVX256 and AVX512 backends for aes

fb85281

silvanshade force-pushed the vaes branch from 93e30c4 to fb85281 Compare June 2, 2025 16:36

tarcieri approved these changes Jun 10, 2025

View reviewed changes

newpavlov merged commit ad83428 into RustCrypto:master Jun 11, 2025
163 checks passed

	check-cfg = ["cfg(aes_compact)", "cfg(aes_force_soft)", "cfg(avx256_disable)", "cfg(avx512_disable)"]
	check-cfg = ["cfg(aes_compact)", "cfg(aes_force_soft)", "cfg(aes_avx256_disable)", "cfg(aes_avx512_disable)"]

	#[inline]
	fn encrypt_tail_blocks(&self, blocks: InOutBuf<'_, '_, Block>) {
	let backend = self;

	let mut rem = blocks.len();
	let (mut iptr, mut optr) = blocks.into_raw();

	let backend = &$name_backend::Vaes256::from(backend);
	if backend.features.has_vaes256() {
	while rem >= backend.par_blocks() {
	let blocks = unsafe { InOut::from_raw(iptr.cast(), optr.cast()) };
	backend.encrypt_par_blocks(blocks);
	rem -= backend.par_blocks();
	iptr = unsafe { iptr.add(backend.par_blocks()) };
	optr = unsafe { optr.add(backend.par_blocks()) };
	}
	}

	let backend = &$name_backend::Ni::from(backend);
	while rem >= backend.par_blocks() {
	let blocks = unsafe { InOut::from_raw(iptr.cast(), optr.cast()) };
	backend.encrypt_par_blocks(blocks);
	rem -= backend.par_blocks();
	iptr = unsafe { iptr.add(backend.par_blocks()) };
	optr = unsafe { optr.add(backend.par_blocks()) };
	}

	while rem > 0 {
	let block = unsafe { InOut::from_raw(iptr, optr) };
	backend.encrypt_block(block);
	rem -= 1;
	iptr = unsafe { iptr.add(1) };
	optr = unsafe { optr.add(1) };
	}
	}
	}

Implement VAES AVX and AVX512 backends for aes #482

Implement VAES AVX and AVX512 backends for aes #482

Uh oh!

Conversation

silvanshade commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VAES512

VAES256

AES-NI

Uh oh!

newpavlov commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarcieri commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

newpavlov commented Jun 2, 2025

Uh oh!

tarcieri commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

newpavlov commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarcieri Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

newpavlov Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

newpavlov Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

newpavlov Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

silvanshade Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

silvanshade Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

silvanshade commented Jun 10, 2025

Uh oh!

tarcieri left a comment

Choose a reason for hiding this comment

Uh oh!

silvanshade commented Jun 10, 2025

Uh oh!

tarcieri commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

newpavlov commented Jun 11, 2025

Uh oh!

Uh oh!

silvanshade commented Jun 11, 2025

Uh oh!

tarcieri commented Jun 11, 2025

Uh oh!

Uh oh!

silvanshade commented May 29, 2025 •

edited

Loading

newpavlov commented Jun 2, 2025 •

edited

Loading

tarcieri commented Jun 2, 2025 •

edited

Loading

tarcieri commented Jun 2, 2025 •

edited

Loading

newpavlov commented Jun 2, 2025 •

edited

Loading

newpavlov Jun 2, 2025 •

edited

Loading

silvanshade Jun 2, 2025 •

edited

Loading

tarcieri commented Jun 10, 2025 •

edited

Loading